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Introduction 

Embedded  DSP  (digital  signal  processing)  applications  are  typically  implemented  using  fixed  point  arith¬ 
metic;  in  hardware  to  reduce  area  requirements  and  increase  throughput,  but  also  in  software  since  most 
embedded  processors  do  not  provide  floating  point  arithmetic.  Consequently,  the  developer  is  confronted 
with  the  difficult  task  of  deciding  on  the  fixed  point  format,  i.e.,  the  number  of  integer  and  fractional  bits 
to  avoid  overflow  and  ensure  sufficient  accuracy.  For  software  implementations,  the  entire  bitwidth  is  fixed, 
typically  at  32,  which  means  that  increasing  the  representable  range  (number  of  integer  bits)  reduces  the 
available  accuracy  (number  of  fractional  bits)  and  vice-versa. 

In  this  paper  we  present  a  compiler  that  translates  a  floating  point  C  implementation  of  a  linear  DSP 
kernel,  such  as  a  discrete  Fourier  or  wavelet  transform,  into  a  high  accuracy  fixed  point  C  implementation. 
The  inputs  to  the  compiler  are  a  floating  point  arithmetic  C  program  and  the  range  of  the  input  vector 
elements.  First,  the  compiler  statically  analyzes  the  program  in  a  single  pass  using  a  recently  developed  tool 
that  uses  affine  arithmetic  modeling  [1].  Then,  in  the  global  mode,  the  compiler  determines  the  global  fixed 
point  format  with  the  least  number  of  integer  bits  (and  thus  the  highest  accuracy)  that  guarantees  to  avoid 
overflow  and  outputs  the  corresponding  code.  More  interesting  is  the  local  mode,  in  which  the  compiler 
determines  the  best  format  independently  for  each  variable,  thus  further  pushing  the  possible  accuracy.  The 
compiler  is  currently  limited  to  straightline  code;  an  extension  to  loop  code  is  in  development. 

Further,  we  used  the  SPIRAL  code  generator  [2]  to  generate  numerically  robust  implementations  as 
input  to  our  compiler,  thus  automating  the  entire  design  flow  of  creating  high  accuracy  fixed  point  imple¬ 
mentations  for  linear  DSP  kernels.  Experiments  with  different  transforms  show  that  by  choosing  the  formats 
independently  (local  mode)  the  accuracy  can  be  improved  by  a  factor  of  up  to  5  in  terms  of  a  norm-based 
error  measure. 

Adaptive  Fixed-Point  Mapping  for  High  Accuracy 

Our  approach  to  generating  a  high  accuracy  fixed  point  implementation  for  a  DSP  transform  T  consists  of 
the  following  two  high-level  steps;  the  second  step  is  our  main  contribution. 

•  We  generate  a  numerically  robust  initial  floating  point  implementation  for  T  using  SPIRAL. 

•  We  translate  this  implementation  into  a  high  accuracy  fixed  point  implementation  using  the  input  range 
as  additional  information. 

Generating  a  Robust  Initial  Implementation.  SPIRAL  is  a  generator  for  fast,  platform- adapted  im¬ 
plementations  of  DSP  transforms  and  filters.  SPIRAL  operates  in  a  feedback  loop  that  generates,  for  a  given 
transform,  alternative  algorithms  and  implementations  to  find  the  best  match  to  the  given  platform.  The 
feedback  loop  is  driven  by  the  measured  runtimes  of  the  generated  codes;  by  replacing  it  with  a  norm-based 
accuracy  measure,  we  use  SPIRAL  to  generate  numerically  robust  code. 

Adaptive  Translation  into  Fixed  Point  Code.  To  translate  a  floating  point  implementation  into  fixed 
point  format,  the  crucial  task  is  to  determine  the  maximal  range  of  each  occurring  variable.  The  tool  in  [1] 
uses  affine  arithmetic  modeling  to  achieve  this  statically  with  a  single  pass  through  the  code.  The  basic  idea 
is  to  represent  each  variable  x  by  an  affine  expression 
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accuracy  when  cut  to  8  bits 


Figure  1:  Left:  accuracy  histogram  of  10,000  SPIRAL  generated  algorithms  for  a  DCT,  type  2,  size  32.  Right:  relative  accuracy  of 
“local  method”  vs.  ’’global  method”  (lower  is  better). 


where  the  e,  arc  random  variables  uniformly  distributed  in  [—1,1].  Intuitively,  some  of  the  ef s  capture  the 
range  of  each  variable  and  others  the  uncertainty  due  to  finite  precision  effects  in  the  computation.  Starting 
from  the  input,  these  affine  expressions  are  computed  for  every  variable  in  the  code;  each  finite  precision 
operation  introduces  a  new  error  variable.  For  example,  a  global  input  range  of  [-N,  N]  corresponds  to 
affine  expressions  of  the  form  0  +  Ne\,  i.e.,  at  the  input  the  entire  uncertainty  is  due  to  range.  For  further 
details  on  the  method  see  [1]. 

From  the  affine  expression  for  a  variable,  its  maximal  range  is  obtained  by  setting  all  to  1  and  -1. 
In  the  global  mode,  we  determine  the  number  of  integer  bits  through  the  largest  occurring  range  among  all 
variables.  In  the  local  mode,  the  format  of  each  variable  is  chosen  independently. 

Results 

Figure  1  (left)  shows  a  robustness  histogram  of  10,000  SPIRAL  generated  algorithms  for  a  DCT  (type 
2)  of  size  32.  The  robustness  measure  compares  a  floating  point  implementation  to  an  8-bit  fixed-point 
implementation  for  each  algorithm.1  Most  algorithms  are  within  a  factor  of  2-3.  Using  SPIRAL’s  search 
mechanism  we  generate  a  robust  algorithm  as  input  to  our  compiler. 

Figure  1  (right)  shows  the  benefit  of  choosing  independent  fixed  point  formats  (local  mode)  versus 
choosing  a  global  format.  Each  line  represents  a  transform;  the  .x-axis  shows  the  logarithm  of  the  chosen 
input  range  (e.g.,  10  means  a  range  of  [ — 210,  210]);  the  y-axis  shows  the  relative  accuracy  of  local  vs.  global. 
The  best  improvement  of  a  factor  of  3-5  is  achieved  for  a  real  DFT  (RDFT). 

Conclusion.  Our  compiler  achieves  two  main  goals  in  the  targeted  domain.  First,  we  free  the  developer 
from  choosing  a  suitable  fixed-point  format  by  hand.  Second,  we  obliterate  the  need  for  extensive  simula¬ 
tions,  since  the  generated  code  provably  avoids  overflow  by  construction.  By  using  our  compiler  as  backend 
to  SPIRAL,  the  design  flow  is  fully  automated. 

References 

[1]  Claire  F.  Fang,  Rob  A.  Rutenbar,  and  Tsuhan  Chen.  Fast,  Accurate  Static  Analysis  for  Fixed-Point  Finite  Precision 
Effects  in  DSP  Designs.  In  Proc.  ICCAD,  2003. 

[2]  M.  Piischel,  B.  Singer,  J.  Xiong,  J.  M.  F.  Moura,  J.  Johnson,  D.  Padua,  M.  Veloso,  and  R.  W.  Johnson.  SPIRAL: 
A  Generator  for  Platform- Adapted  Libraries  of  Signal  Processing  Algorithms.  International  Journal  of  High 
Performance  Computing  Applications,  1 8 ( 1 ) : 2 1  —45 ,  2004.  http://www.spiral.net. 
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Motivation 


■  Embedded  DSP  applications  (SW  and  HW)  typically  use  fixed- 
point  arithmetic  for  reduced  power/area  and  better  throughput 

■  Typically  DSP  algorithms  are  manually  mapped  to  fixed-point 
implementation 

■  time  consuming,  non-trivial  task 

■  difficult  trade-off  between  range  (to  avoid  overflow)  and 
precision 

■  usually  done  using  simulations  (not  an  exact  science) 

■  Our  goal:  automatically  generate  overflow-proof,  and  accurate 
fixed-point  code  (SW)  for  linear  DSP  kernels  using  the  SPIRAL 
code  generator 
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Background;  SPIRAL 


■  Generates  fast,  platform-adapted  code  for  linear  DSP 
transforms  (DFT,  DCTs,  DSTs,  filters,  DWT,  ...) 

■  Adapts  by  searching  in  the  algorithm  space  and 
implementation  space  for  the  best  match  to  the  platform 

■  Floating-point  code  only 

■  Our  goal:  extend  SPIRAL  to  generate  overflow-proof, 
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SPIRAL 
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Background:  Transform  Algorithms 

■  Reduce  computation  cost  from  0(n2)  to  0(n  log  n)  or  below 

■  For  every  transform  there  are  many  algorithms 

■  An  algorithm  can  be  represented  as 

■  Sparse  matrix  factorization 
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■  Data  flow  DAG  (Directed  Acyclic  Graph)  ■  Program 


tl  =  a  *  x2 
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Background:  Fixed-Point  Arithmetic 


Uses  integers  to  represent  fractional  numbers: 

IB  FB _ ^ 

Example  (RW=9,  IB=FB=4) 


sign 

integer  bits 

fractional  bits 

◄ - ► 

register  width:  RW  =  1  +  IB  +  FB  (typically  16  or  32) 

1  Operations 


0011  00112  =  1011.01112  =  3.187510 


a+b 


a  •  b  »  f  b 


addition  multiplication 

Dynamic  range: 

■  -2IB  ...  2IB-1 

■  much  smaller  than  in  floating-point )  risk  of  overflow 

Problem:  for  a  given  application,  choose  IB  (and  thus  FB)  to  avoid 
overflow 


We  present  an  algorithm  to  automatically  choose,  application 
dependent,  “best”  IB  (and  thus  FB)  for  linear  DSP  kernels 
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Overview  of  Approach 

■  Extension  of  SPIRAL  code  generator 

■  Fixed-point  mapping:  maps  floating-point  code  into  fixed-point 
code,  given  the  input  range 

■  Use  SPIRAL  to  automatically  search  for  the  fixed-point 
implementation 
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Tool:  Affine  Arithmetic 


89  Basic  idea:  propagate  ranges  through  the  computation 
(interval  arithmetic,  IA);  each  variable  becomes  an  interval 

■  Problem:  leads  to  range  overestimation,  since  correlations 
between  variables  are  not  considered 


■  Solution:  affine  arithmetic  (AA)  [1] 

■  represents  range  as  affine  expression 


■  captures  correlations 


jL 


Error 
r._iru!>  of  ■% 


Error 
ron^tof  v 


IA:  A(x)  =  [-M,M] 

AA:  A(x)  =  c0  E0  +c1E1+... 


Ej  are  ranges,  e.g.,Ej=[-1,1] 


[1]  Fang  Fang,  Rob  A.  Rutenbar, 

Markus  Puschel,  and  Tsuhan  Chen 

Toward  Efficient  Static  Analysis  of  Finite- 
Precision  Effects  in  DSP  Applications  via 
Affine  Arithmetic  Modeling 

Proc.  DAC  2003,  pp.  496-501 
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Algorithm  1  [Range  Propagation] 

■  Input:  Program  with  additions  and  multiplications  by 

constants,  ranges  of  inputs 

■  Output:  Ranges  of  outputs  and  intermediate  results 

■  Denote  input  ranges  by  X,  with  i2[1,N] 

■  We  represent  all  variables  v  as  affine  expressions  A: 

A(v)  =  Z^=Q  Ci  ■  Xi  where  Cj  are  constants 

"  Traverse  all  variables  from  input  to  output,  and  compute  A: 

A(xi)  =  Xi 

A(vi+v2)  =  A(vi)  +  A(v2) 

A(c  •  v)  =  c  •  A(v) 


■  Variable  ranges  R=[Rmin,Rmax]  are  given  by 

^min  =  ^minQ 2ici'xi)  —  \ci\ ‘^min(xi) 
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Example 


Program 

Affine  Expressions 

tl  =  xl  +  x2 

A(tl)  =  xl  +  x2 

t2  =  xl  -  x2 

A(t2)  =  xl  -  x2 

yl  =  1.2  *  tl 

A  (yl)  =  1.2  xl  +  1.2  x2 

y2  =  -2.3  *  t2 

A(y2)  =  -2.3  xl  +  2.3  x2 

y3  =  yl  +  y2 

A(y3)  =  -1.1  xl  +  3.5  x2 

4- 


Given  Ranges 
R(xl)  =  [-1,1] 
R(x2)  =  [-1,1] 


Computed  Ranges 


R (tl)  = 

[-2, 

r  2  ] 

R(t2)  = 

[-2, 

r  2  ] 

R (yl)  = 

[-2. 

.4,2, 

.4] 

R(y2)  = 

[-2. 

.6,2, 

.6] 

R(y3)  = 

[-4. 

.6,4, 

.6] 

ranges  are  exact  (not  worst  cases) 
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Algorithm  2  [Error  Propagation] 

I  ■  Input:  Program  with  additions  and  multiplications  by 
I  constants,  ranges  of  inputs 

■  Output:  Error  bounds  on  outputs  and  intermediate  results 


■  Denote  by  e,  in  [-1,1]  independent  random  error  variables 

■  We  augment  affine  expressions  A  with  error  terms: 


AO)  =  E™=0  ci  '  xi  +  Ej  fj  ■  £j 


where  fj  are  error 
magnitude  constants 


Traverse  all  variables  from  input  to  output,  and  compute  Ae: 

AOi)  =  Xi  f 


AOi+w2)  =  AOl)  +  AO2)  +  2  I'K-maxOl  +  «2)le 
AO  •  v)  =  c-yte(t>)  +  2  rty|7?,max(c  •  w)|e 


new  error  variable  introduced 


Maximum  error  is  given  by  £(v  )  =  Ej  I  fj 
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Fixed-Point  Mapping 


■  Input: 

■  floating  point  program  (straightline  code)  for  linear  transform 

■  ranges  of  input 

■  Output:  fixed-point  program 

■  Algorithm: 

■  Determine  the  affine  expressions  of  all  intermediate  and  output 
variables;  compute  their  maximal  ranges 

■  Mode  1:  Global  format 

■  the  largest  range  determines  the  fixed  point  format  globally 

■  Mode  2:  Local  format 

■  allow  different  formats  for  all  intermediate  and  output 
variables 

■  Convert  floating-point  constants  into  fixed-point  constants 

■  Convert  floating-point  operations  into  fixed-point  operations 

■  Output  fixed-point  code 
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Accuracy  Measure 


Goal:  evaluate  a  SPIRAL  generated  fixed-point  program  for 
accuracy  to  enable  search  for  best  =  most  accurate  algorithm 

Choose  input  independent  accuracy  measure:  matrix  norm 


T-T 


max  row  sum  norm 


00 


matrix  for  exact 
(floating-point)  program 


matrix  for 
fixed-point  program 


Note:  can  be  used  to  derive  input  dependent  error  bounds 


y-y 


00 


<  T-T 


00 


X 


00 
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Probabilistic  Analysis 


Fixed  point  mapping  chooses  range  conservatively,  namely: 

A(x )  =  C0X0  +  CjJCj  H — 


leads  to  a  range  estimate  of 


2>,|min(|*,|),X|  c.  |  max(|  x.  |) 


However:  not  all  values  in  [-M,M]  are  equally  likely 


Analysis: 

■  Assume  xi  are  uniformly  distributed,  independent  random 
variables 


Use  Central  Limit  Theorem:  A(x)  is  approximately  Gaussian 

Extend  Fixed-Point  Mapping  to  include  a  probabilistic  mode 
(range  satisfied  with  given  probability  p) 
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Overestimation  due  to  Central  Limit  Theorem 


DCTType  2  Size  4  -  Sample  Output  Variable  Range  Distribution 
min  max 


affine 

expression 

with: 

4  terms 


16  terms 


64  terms 


assuming  input/error  variables  are  independent 
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Accuracy  Histogram 


DCT,  size  32 

10,000  random  algorithms 
Spiral  generated 


■  Spread  lOx,  most  within  2x 

■  Need  for  search 


local  mode  aocuracy/ global  mode  aocuracy 


Global  vs.  Local  Mode 


several 

transforms 


local  mode  a  factor  of  1.5-2  better 


gaussian  local  mode  error  (p  =  99.99%)  /  local  mode  error 


Local  vs.  Gaussian  Local  Mode 


99.99% 
confidence 
for  each 
variable 
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Summary 


■  An  automatic  method  to  generate  accurate,  overflow-proof  fixed- 
point  code  for  linear  DSP  kernels 

■  Using  SPIRAL  to  find  the  most  accurate  algorithm:  2x 

■  Floating-point  to  fixed-point  using  affine  arithmetic  analysis 
(global,  local:  2x,  probabilistic:  4x) 

■  16x 


■  Current  work: 

■  Extend  approach  to  handle  loop  code  and  thus  arbitrary  size  transforms 

■  Refine  probabilistic  mode  to  get  statements  as: 

prob(overflow)  <  p 


■  Further  down  the  road: 

■  Fixed-point  mapping  compiler  for  more  general  numerical  DSP 
kernels/applications 
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